Search Results: "josch"

30 November 2014

Johannes Schauer: simple email setup

I was unable to find a good place that describes how to create a simple self-hosted email setup. The most surprising discovery was, how much already works after:
apt-get install postfix dovecot-imapd
Right after having finished the installation I was able to receive email (but only in in /var/mail in mbox format) and send email (bot not from any other host). So while I expected a pretty complex setup, it turned out to boil down to just adjusting some configuration parameters.

Postfix The two interesting files to configure postfix are /etc/postfix/main.cf and /etc/postfix/master.cf. A commented version of the former exists in /usr/share/postfix/main.cf.dist. Alternatively, there is the ~600k word strong man page postconf(5). The latter file is documented in master(5).

/etc/postfix/main.cf I changed the following in my main.cf
@@ -37,3 +37,9 @@
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
+
+home_mailbox = Mail/
+smtpd_recipient_restrictions = permit_mynetworks reject_unauth_destination permit_sasl_authenticated
+smtpd_sasl_type = dovecot
+smtpd_sasl_path = private/auth
+smtp_helo_name = my.reverse.dns.name.com
At this point, also make sure that the parameters smtpd_tls_cert_file and smtpd_tls_key_file point to the right certificate and private key file. So either change these values or replace the content of /etc/ssl/certs/ssl-cert-snakeoil.pem and /etc/ssl/private/ssl-cert-snakeoil.key. The home_mailbox parameter sets the default path for incoming mail. Since there is no leading slash, this puts mail into $HOME/Mail for each user. The trailing slash is important as it specifies qmail-style delivery'' which means maildir. The default of the smtpd_recipient_restrictions parameter is permit_mynetworks reject_unauth_destination so this just adds the permit_sasl_authenticated option. This is necessary to allow users to send email when they successfully verified their login through dovecot. The dovecot login verification is activated through the smtpd_sasl_type and smtpd_sasl_path parameters. I found it necessary to set the smtp_helo_name parameter to the reverse DNS of my server. This was necessary because many other email servers would only accept email from a server with a valid reverse DNS entry. My hosting provider charges USD 7.50 per month to change the default reverse DNS name, so the easy solution is, to instead just adjust the name announced in the SMTP helo.

/etc/postfix/master.cf The file master.cf is used to enable the submission service. The following diff just removes the comment character from the appropriate section.
@@ -13,12 +13,12 @@
#smtpd pass - - - - - smtpd
#dnsblog unix - - - - 0 dnsblog
#tlsproxy unix - - - - 0 tlsproxy
-#submission inet n - - - - smtpd
-# -o syslog_name=postfix/submission
-# -o smtpd_tls_security_level=encrypt
-# -o smtpd_sasl_auth_enable=yes
-# -o smtpd_client_restrictions=permit_sasl_authenticated,reject
-# -o milter_macro_daemon_name=ORIGINATING
+submission inet n - - - - smtpd
+ -o syslog_name=postfix/submission
+ -o smtpd_tls_security_level=encrypt
+ -o smtpd_sasl_auth_enable=yes
+ -o smtpd_client_restrictions=permit_sasl_authenticated,reject
+ -o milter_macro_daemon_name=ORIGINATING
#smtps inet n - - - - smtpd
# -o syslog_name=postfix/smtps
# -o smtpd_tls_wrappermode=yes

Dovecot Since above configuration changes made postfix store email in a different location and format than the default, dovecot has to be informed about these changes as well. This is done in /etc/dovecot/conf.d/10-mail.conf. The second configuration change enables postfix to authenticate users through dovecot in /etc/dovecot/conf.d/10-master.conf. For SSL one should look into /etc/dovecot/conf.d/10-ssl.conf and either adapt the parameters ssl_cert and ssl_key or store the correct certificate and private key in /etc/dovecot/dovecot.pem and /etc/dovecot/private/dovecot.pem, respectively. The dovecot-core package (which dovecot-imapd depends on) ships tons of documentation. The file /usr/share/doc/dovecot-core/dovecot/documentation.txt.gz gives an overview of what resources are available. The path /usr/share/doc/dovecot-core/dovecot/wiki contains a snapshot of the dovecot wiki at http://wiki2.dovecot.org/. The example configurations seem to be the same files as in /etc/ which are already well commented.

/etc/dovecot/conf.d/10-mail.conf The following diff changes the default email location in /var/mail to a maildir in ~/Mail as configured for postfix above.
@@ -27,7 +27,7 @@
#
# <doc/wiki/MailLocation.txt>
#
-mail_location = mbox:~/mail:INBOX=/var/mail/%u
+mail_location = maildir:~/Mail

# If you need to set multiple mailbox locations or want to change default
# namespace settings, you can do it by defining namespace sections.

/etc/dovecot/conf.d/10-master.conf And this enables the authentication socket for postfix:
@@ -93,9 +93,11 @@


# Postfix smtp-auth
- #unix_listener /var/spool/postfix/private/auth
- # mode = 0666
- #
+ unix_listener /var/spool/postfix/private/auth
+ mode = 0660
+ user = postfix
+ group = postfix
+

# Auth process is run as this user.
#user = $default_internal_user

Aliases Now Email will automatically put into the '~/Mail' directory of the receiver. So a user has to be created for whom one wants to receive mail...
$ adduser josch
...and any aliases for it to be configured in /etc/aliases.
@@ -1,2 +1,4 @@
-# See man 5 aliases for format
-postmaster: root
+root: josch
+postmaster: josch
+hostmaster: josch
+webmaster: josch
After editing /etc/aliases, the command
$ newaliases
has to be run. More can be read in the aliases(5) man page.

Finishing up Everything is done and now postfix and dovecot have to be informed about the changes. There are many ways to do that. Either restart the services, reboot or just do:
$ postfix reload
$ doveadm reload
Have fun!

7 November 2014

Johannes Schauer: automatically suspending cpu hungry applications

TLDR: Using the awesome window manager: how to automatically send SIGSTOP and SIGCONT to application windows when they get unfocused or focused, respectively, to let the application not waste CPU cycles when not in use. I don't require any fancy looking GUI, so my desktop runs no full-blown desktop environment like Gnome or KDE but instead only awesome as a light-weight window manager. Usually, the only application windows I have open are rxvt-unicode as my terminal emulator and firefox/iceweasel with the pentadactyl extension as my browser. Thus, I would expect that CPU usage of my idle system would be pretty much zero but instead firefox decides to constantly eat 10-15%. Probably to update some GIF animations or JavaScript (or nowadays even HTML5 video animations). But I don't need it to do that when I'm not currently looking at my browser window. Disabling all JavaScript is no option because some websites that I need for uni or work are just completely broken without JavaScript, so I have to enable it for those websites. Solution: send SIGSTOP when my firefox window looses focus and send SIGCONT once it gains focus again. The following addition to my /etc/xdg/awesome/rc.lua does the trick:
local capi =   timer = timer  
client.add_signal("focus", function(c)
if c.class == "Iceweasel" then
awful.util.spawn("kill -CONT " .. c.pid)
end
end)
client.add_signal("unfocus", function(c)
if c.class == "Iceweasel" then
local timer_stop = capi.timer timeout = 10
local send_sigstop = function ()
timer_stop:stop()
if client.focus.pid ~= c.pid then
awful.util.spawn("kill -STOP " .. c.pid)
end
end
timer_stop:add_signal("timeout", send_sigstop)
timer_stop:start()
end
end)
Since I'm running Debian, the class is "Iceweasel" and not "Firefox". When the window gains focus, a SIGCONT is sent immediately. I'm executing kill because I don't know how to send UNIX signals from lua directly. When the window looses focus, then the SIGSTOP signal is only sent after a 10 second timeout. This is done for several reasons: With this change, when I now open htop, the process consuming most CPU resources is htop itself. Success! Another cool advantage is, that firefox can now be moved completely into swap space in case I run otherwise memory hungry applications without ever requiring any memory from swap until I really use it again. I haven't encountered any disadvantages of this setup yet. If 10 seconds prove to be too short to copy and paste I can easily extend this delay. Even clicking on links in my terminal works flawlessly - the new tab will just only load once firefox gets focused again. EDIT: thanks to Helmut Grohne for suggesting to compare the pid instead of the raw client instance to prevent misbehaviour when firefox opens additional windows like the preferences dialog.

29 July 2014

Johannes Schauer: bootstrap.debian.net temporarily not updated

I'll be moving places twice within the next month and as I'm hosting the machine that generates the data, I'll temporarily suspend the bootstrap.debian.net service until maybe around September. Until then, bootstrap.debian.net will not be updated and retain the status as of 2014-07-28. Sorry if that causes any inconvenience. You can write to me if you need help with manually generating the data bootstrap.debian.net provided.

5 June 2014

Johannes Schauer: botch updates

My last update about ongoing development of botch, the bootstrap/build ordering tool chain, was four months ago and about several incremental updates. This post will be of similar nature. The most interesting news is probably the additional data that bootstrap.debian.net now provides. This is listed in the next section. All subsequent sections then list the changes under the hood that made the additions to bootstrap.debian.net possible.

bootstrap.debian.net The bootstrap.debian.net service used to have botch as a git submodule but now runs botch from its Debian package. This at least proves that the botch Debian package is mature enough to do useful stuff with it. In addition to the bootstrapping results by architecture, bootstrap.debian.net now also hosts the following additional services: Further improvements concern how dependency cycles are now presented in the html overviews. While before, vertices in a cycle where separated by commas as if they were simple package lists, vertices are now connected by unicode arrows. Dashed arrows indicate build dependencies while solid arrows indicate builds-from relationships. For what it's worth, installation set vertices now contain their installation set in their title attribute.

Debian package Botch has long depended on features of an unreleased version of dose3 which in turn depended on an unrelease version of libcudf. Both projects have recently made new releases so that I was now able to drop the dose3 git submodule and rely on the host system's dose3 version instead. This also made it possible to create a Debian package of botch which currently sits at Debian mentors. Writing the package also finally made me create a usable install target in the Makefile as well as adding stubs for the manpages of the 44 applications that botch currently ships. The actual content of these manpages still has to be written. The only documentation botch currently ships in the botch-doc package is an offline version of the wiki on gitorious. The new page ExamplesGraphs even includes pictures.

Cross By default, botch analyzes the native bootstrapping phase. That is, assume that the initial set of Essential:yes and build-essential packages magically exists and find out how to bootstrap the rest from there through native compilation. But part of the bootstrapping problem is also to create the set of Essential:yes and build-essential packages from nothing via cross compilation. Botch is unable to analyze the cross phase because too many packages cannot satisfy their crossbuild dependencies due to multiarch conflicts. This problem is only about the dependency metadata and not about whether a given source package actually crosscompiles fine in practice. Helmut Grohne has done great work with rebootstrap which is regularly run by jenkins.debian.net. He convinced me that we need an overview of what packages are blocking the analysis of the cross case and that it was useful to have a crossbuild order even if that was a fake order just to have a rough overview of the current situation in Debian Sid. I wrote a couple of scripts which would run dose-builddebcheck on a repository, analyze which packages fail to satisfy their crossbuild dependencies and why, fix those cases by adjusting package metadata accordingly and repeat until all relevant source packages satisfy their crossbuild dependencies. The result of this can then be used to identify the packages that need to be modified as well as to generate a crossbuild order. The fixes to the metadata are done in an automatic fashion and do not necessarily reflect the real fix that would solve the problem. Nevertheless, I ended up agreeing that it is better to have a slightly wrong overview than no overview at all.

Minimizing the dependency graph size Installation sets in the dependency graph are calculated independent from each other. If two binary packages provide A, then dependencies on A in different installation sets might choose different binary packages as providers of A. The same holds for disjunctive dependencies. If a package depends on A C and another package depends on C A then there is no coordination to choose C so to minimize the overall amount of vertices in the graph. I implemented two methods to minimize the impact of cases where the dependency solver has multiple options to satisfy a dependency through Provides and dependency disjunctions. The first method is inspired by Helmut Grohne. An algorithm goes through all disjunctive binary dependencies and removes all virtual packages, leaving only real packages. Of the remaining real packages, the first one is selected. For build dependencies, the algorithm drops all but the first package in every disjunction. This is also what sbuild does. Unfortunately this solution produces an unsatisfiable dependency situation in most cases. This is because oftentimes it is necessary to select the virtual disjunctive dependency because of a conflict relationship introduced by another package. The second method involves aspcud, a cudf solver which can optimize a solution by a criteria. This solution is based on an idea by Pietro Abate who implemented the basis for this idea back in 2012. In contrast to a usual cudf problem, binary packages now also depend on the source packages they build from. If we now ask aspcud to find an installation set for one of the base source packages (I chose src:build-essential) then it will return an installation set that includes source packages. As an optimization criteria the number of source packages in the installation set is minimized. This solution would be flawless if there were no conflicts between binary packages. Due to conflicts not all binary packages that must be coinstallable for this strategy to work can be coinstalled. The quick and dirty solution is to remove all conflicts before passing the cudf universe to aspcud. But this also means that the solution does sometimes not work in practice.

Test cases Botch now finally has a test target in its Makefile. The test target tests two code paths of the native.sh script and the cross.sh script. Running these two scripts covers testing most parts of botch. Given that I did lots of refactoring in the past weeks, the test cases greatly helped to assure that I didnt break anything in the process. I also added autopkgtests to the Debian packaging which test the same things as the test target but naturally run the installed version of botch instead. The autopkgtests were a great help in weeding out some lasts bugs which made botch depend on being executed from its source directory.

Python 3 Reading the suggestions in the Debian python policy I evaluated the possibility to use Python 3 for the Python scripts in botch. While I was at it I added transparent decompression for gzip, bz2 and xz based on the file magic, replaced python-apt with python-debian because of bug#748922 and added argparse argument parsing to all scripts. Unfortunately I had to find out that Python 3 support does not yet seem to be possible for botch for the following reasons:
  • no soap module for Python 3 in Debian (needed for bts access)
  • hash randomization is turned on by default in Python 3 and therefore the graph output of networkx is not deterministic anymore (bug#749710)
Thus I settled for changing the code such that it would be compatible with Python 2 as well as with Python 3. Because of the changed string handling and sys.stdout properties in Python 3 this proved to be tricky. On the other hand this showed me bugs in my code where I was wrongly relying on deterministic dictionary key traversal.

3 April 2014

Johannes Schauer: mapbender - maps for long-distance travels

Back in 2007 I stumbled over the "Plus Fours Routefinder", an invention of the 1920's. It's worn on the wrist and allows the user to scroll through a map of the route they planned to take, rolled up on little wooden rollers. At that point I thought: that's awesome for long trips where you either dont want to take electronics with you or where you are without any electricity for a long time. And creating such rollable custom maps of your route automatically using openstreetmap data should be a breeze! Nevertheless it seems nobody picked up the idea. Years passed and in a few weeks I'll go on a biking trip along the Weser, a river in nothern Germany. For my last multi-day trip (which was through the Odenwald, an area in southern Germany) I printed a big map from openstreetmap data which contained the whole route. Openstreetmap data is fantastic for this because in contrast to commercial maps it doesnt only allow you to just print the area you need but also allows you to highlight your planned route and objects you would probably not find in most commercial maps like for example supermarkets to stock up on supplies or bicycle repair shops. Unfortunately such big maps have the disadvantage that to show everything in the amount of detail that you want along your route, they have to be pretty huge and thus easily become an inconvenience because the local plotter can't handle paper as large as DIN A0 or because it's a pain to repeatedly fold and unfold the whole thing every time you want to look at it. Strong winds are also no fun with a huge sheet of paper in your hands. One solution would be to print DIN A4 sized map regions in the desired scale. But that has the disadvantage that either you find yourself going back and forth between subsequent pages because you happen to be right at the border between two pages or you have to print sufficiently large overlaps, resulting in many duplicate map pieces and more pages of paper than you would like to carry with you. It was then that I remembered the "Plus Fours Routefinder" concept. Given a predefined route it only shows what's important to you: all things close to the route you plan to travel along. Since it's a long continuous roll of paper there is no problem with folding because as you travel along the route you unroll one end and roll up the other. And because it's a long continuous map there is also no need for flipping pages or large overlap regions. There is not even the problem of not finding a big enough sheet of paper because multiple DIN A4 sheets can easily be glued together at their ends to form a long roll. On the left you see the route we want to take: the bicycle route along the Weser river. If I wanted to print that map on a scale that allows me to see objects in sufficient detail along our route, then I would also see objects in Hamburg (upper right corner) in the same amount of detail. Clearly a waste of ink and paper as the route is never even close to Hamburg.
As the first step, a smooth approximation of the route has to be found. It seems that the best way to do that is to calculate a B-Spline curve approximating the input data with a given smoothness. On the right you can see the approximated curve with a smoothing value of 6. The curve is sampled into 20 linear segments. I calculated the B-Spline using the FITPACK library to which scipy offers a Python binding.
The next step is to expand each of the line segments into quadrilaterals. The distance between the vertices of the quadrilaterals and the ends of the line segment they belong to is the same along the whole path and obviously has to be big enough such that every point along the route falls into one quadrilateral. In this example, I draw only 20 quadrilaterals for visual clarity. In practice one wants many more for a smoother approximation.
Using a simple transform, each point of the original map and the original path in each quadrilateral is then mapped to a point inside the corresponding "straight" rectangle. Each target rectangle has the height of the line segment it corresponds to. It can be seen that while the large scale curvature of the path is lost in the result, fine details remain perfectly visible. The assumption here is, that while travelling a path several hundred kilometers long, it does not matter that large scale curvature that one is not able to perceive anyways is not preserved. The transformation is done on a Mercator projection of the map itself as well as the data of the path. Therefore, this method probably doesnt work if you plan to travel to one of the poles. Currently I transform openstreetmap bitmap data. This is not quite optimal as it leads to text on the map being distorted. It would be just as easy to apply the necessary transformations to raw openstreetmap XML data but unfortunately I didnt find a way to render the resulting transformed map data as a raster image without setting up a database. I would've thought that it would be possible to have a standalone program reading openstreetmap XML and dumping out raster or svg images without a round trip through a database. Furthermore, tilemill, one of the programs that seem to be one of the least hasslesome to set up and produce raster images is stuck in an ITP and the existing packaging attempt fails to produce a non-empty binary package. Since I have no clue about nodejs packaging, I wrote about this to the pkg-javascript-devel list. Maybe I can find a kind soul to help me with it.
Here a side by side overview that doesnt include the underlying map data but only the path. It shows how small details are preserved in the transformation. The code that produced the images in this post is very crude, unoptimized and kinda messy. If you dont care, then it can be accessed here

6 February 2014

Johannes Schauer: botch updates

My last update about ongoing development of botch, the bootstrap/build ordering tool chain, was three months ago with the announcement of bootstrap.debian.net. Since then a number of things happened, so I thought an update was due.

New graphs for port metrics By default, a dependency graph is created by arbitrarily choosing an installation set for binary package installation or source package compilation. Installation set vertices and source vertices are connected according to this arbitrary selection. Niels Thykier approached me at Debconf13 about the possibility of using this graph to create a metric which would be able to tell for each source package, how many other source packages would become uncompilable or how many binary packages would become uninstallable, if that source package was removed from the archive. This could help deciding about the importance of packages. More about this can be found at the thread on debian-devel. For botch, this meant that two new graph graphs can now be generated. Instead of picking an arbitrary installation set for compiling a source package or installing a binary package, botch can now create a minimum graph which is created by letting dose3 calculate strong dependencies and a maximum graph by using the dependency closure.

Build profile syntax in dpkg With dpkg 1.17.2 we now have experimental build profile support in unstable. The syntax which ended up being added was:
Build-Depends: large (>= 1.0), small <!profile.stage1>
But until packages with that syntax can hit the archive, a few more tools need to understand the syntax. The patch we have for sbuild is very simple because sbuild relies on libdpkg for dependency parsing. We have a patch for apt too, but we have to rebase it for the current apt version and have to adapt it so that it works exactly like the functionality dpkg implements. But before we can do that we have to decide how to handle mixed positive and negative qualifiers or whether to remove this feature altogether because it causes too much confusion. The relevant thread on debian-dpkg starts here.

Update to latest dose3 git Botch heavily depends on libdose3 and unfortunately requires features which are only available in the current git HEAD. The latest version packaged in Debian is 3.1.3 from October 2012. Unfortunately the current dose3 git HEAD also relies on unreleased features from libcudf. On top of that, the GraphML output of the latest ocamlgraph version (1.8.3) is also broken and only fixed in the git. For now everything is set up as git submodules but this is the major blocker preventing any packaging of botch as a Debian package. Hopefully new releases will be done soon for all involved components.

Writing and reading GraphML Botch is a collection of several utilities which are connected together in a shell script. The advantage of this is, that one does not need to understand or hack the OCaml code to use botch for different purposes. In theory it also allows to insert 3rd party tools into a pipe to further modify the data. Until recently this ability was seriously hampered by the fact that many botch tools communicated with each other through marshaled OCaml binary files which prevent everything which is not written in OCaml from modifying them. The data that was passed around like this were the dependency graphs and I initially implemented it like that because I didnt want to write a GraphML parser. I now ended up writing an xmlm based GraphML parser so as of now, botch only reads and writes ASCII text files in XML (for the graphs) and in rfc822 packages format (for Packages and Sources files) which can both easily be modified by 3rd party tools. The ./tools directory contains many Python scripts using the networkx module to modify GraphML and the apt_pkg module to modify rfc822 files.

Splitting of tools To further increase the ability to modify program execution without having to know OCaml, I split up some big tools into multiple smaller ones. Some of the smaller tools are now even written in Python which is probably much more hackable for the general crowd. I converted those tools to Python which did not need any dose3 functionality and which were simple enough so that writing them didnt take much time. I could convert more tools but that might introduce bugs and takes time which I currently dont have much of (who does?).

Gzip instead of bz2 Since around January 14, snapshot.debian.org doesnt offer bzip2 compressed Packages and Sources files anymore but uses xz instead. This is awesome for must purposes but unfortunately I had to discover that there exist no OCaml bindings for libxz. Thus, botch is now using gzip instead of bz2 until either myself or anybody else finds some time to write a libxz OCaml binding.

Self hosting Fedora Paul Wise made me aware of Harald Hoyer's attempts to bootstrap Fedora. I reproduced his steps for the Debian dependency graph and it turns out that they are a little bit bigger. I'm exchanging emails with Harald Hoyer because it might not be too hard to use botch for rpm based distributions as well because dose3 supports rpm. The article also made me aware of the tred tool which is part of graphviz and allows to calculate the transitive reduction of a graph. This can help making horrible situations much better.

Dose3 bugs I planned to generate such simplified graphs for the neighborhood of each source package on bootstrap.debian.net but then binutils stopped building binutils-gold and instead provided binutils-gold while libc6-dev breaks binutils-gold (<< 2.20.1-11). This unfortunately triggered a dose3 bug and thus bootstrap.debian.net will not generate any new results until this is fixed in dose3. Another dose3 bug affects packages which Conflicts/Replaces/Provides:bar while bar is fully virtual and are Multi-Arch:same. Binaries of different architecture with this property can currently not be co-installed with dose3. Unfortunately linux-libc-dev has this property and thus botch cannot be used to analyze cross builds until that bug is fixed in dose3. I hope I get some free time soon to be able to look at these dose3 issues myself.

More documentation Since I started to like the current set of tools and how they work together I ended up writing over 2600 words of documentation in the past few days. You can start setting up and running botch by reading the first steps and get more detailed information by reading about the 28 tools that botch makes use of as of now. All existing articles, thesis and talks are linked from the wiki home.

11 January 2014

Johannes Schauer: Why do I need superuser privileges when I just want to write to a regular file

I have written a number of scripts to create Debian foreign architecture (mostly armel and armhf) rootfs images for SD cards or NAND flashing. I started with putting Debian on my Openmoko gta01 and gta02 and continued with devices like the qi nanonote, a marvel kirkwood based device, the Always Innovating Touchbook (close to the Beagleboard), the Notion Ink Adam and most recently the Golden Delicious gta04. Once it has been manufactured, I will surely also get my hands dirty with the Neo900 whose creators are currently looking for potential donors/customers to increase the size of the first batch and get the price per unit further down. Creating a Debian rootfs disk image for all these devices basically follows the same steps:
  1. create an disk image file, partition it, format the partitions and mount the / partition into a directory
  2. use debootstrap or multistrap to extract a selection of armel or armhf packages into the directory
  3. copy over /usr/bin/qemu-arm-static for qemu user mode emulation
  4. chroot into the directory to execute package maintainer scripts with dpkg --configure -a
  5. copy the disk image onto the sd card
It was not long until I started wondering why I had to run all of the above steps with superuser privileges even though everything except the final step (which I will not cover here) was in principle nothing else than writing some magic bytes to files I had write access to (the disk image file) in some more or less fancy ways. So I tried using fakeroot+fakechroot and after some initial troubles I managed to build a foreign architecture rootfs without needing root priviliges for steps two, three and four. I wrote about my solution which still included some workarounds in another article here. These workarounds were soon not needed anymore as upstream fixed the outstanding issues. As a result I wrote the polystrap tool which combines multistrap, fakeroot, fakechroot and qemu user mode emulation. Recently I managed to integrate proot support in a separate branch of polystrap. Last year I got the LEGO ev3 robot for christmas and since it runs Linux I also wanted to put Debian on it by following the instructions given by the ev3dev project. Even though ev3dev calls itself a "distribution" it only deviates from pure Debian by its kernel, some configuration options and its initial package selection. Otherwise it's vanilla Debian. The project also supplies some multistrap based scripts which create the rootfs and then partition and populate an SD card. All of this is of course done as the superuser. While the creation of the file/directory structure of the foreign Debian armel rootfs can by now easily be done without superuser priviliges by running multistrap under fakeroot/fakechroot/proot, creating the SD card image still seems to be a bit more tricky. While it is no problem to write a partition table to a regular file, it turned out to be tricky to mount these partition because tools like kpartx and losetup require superuser permissions. Tools like mkfs.ext3 and fuse-ext2 which otherwise would be able to work on a regular file without superuser privileges do not seem to allow to specify the required offsets that the partitions have within the disk image. With fuseloop there exists a tool which allows to "loop-mount" parts of a file in userspace to a new file and thus allows tools like mkfs.ext3 and fuse-ext2 to work as they normally do. But fuseloop is not packaged for Debian yet and thus also not in the current Debian stable. An obvious workaround would be to create and fill each partition in a separate file and concatenate them together. But why do I have to write my data twice just because I do not want to become the superuser? Even worse: because parted refuses to write a partition table to a file which is too small to hold the specified partitions, one spends twice the disk space of the final image: the image with the partition table plus the image with the main partition's content. So lets summarize: a bootable foreign architecture SD card disk image is nothing else than a regular file representing the contents of the SD card as a block device. This disk image is created in my home directory and given enough free disk space there is nothing stopping me from writing any possible permutation of bits to that file. Obviously I'm interested in a permutation representing a valid partition table and file systems with sensible content. Why do I need superuser privileges to generate such a sensible permutation of bits? Gladly it seems that the (at least in my opinion) hardest part of faking chroot and executing foreign architecture package maintainer scripts is already possible without superuser privileges by using fakeroot and fakechroot or proot together with qemu user mode emulation. But then there is still the blocker of creating the disk image itself through some user mode loop mounting of a filesystem occupying a virtual "partition" in the disk image. Why has all this only become available so very recently and still requires a number of workarounds to fully work in userspace? There exists a surprising amount of scripts which wrap debootstrap/multistrap. Most of them require superuser privileges. Does everybody just accept that they have to put a sudo in front of every invocation and hope for the best? While this might be okay for well tested code like debootstrap and multistrap the countless wrapper scripts might accidentally (be it a bug in the code or a typo in the given command line arguments) write to your primary hard disk instead of your SD card. Such behavior can easily be mitigated by not executing any such script with superuser privileges in the first place. Operations like loop mounting affect the whole system. Why do I have to touch anything outside of my home directory (/dev/loop in this case) to populate a file in it with some meaningful bits? Virtualization is no option because every virtualization solution again requires root privileges. One might argue that a number of solutions just require some initial setup by root to then later be used by a regular user (for example /etc/fstab configuration or the schroot approach). But then again: why do I have to write anything outside of my home directory (even if it is only once) to be able to write something meaningful to a file in it? The latter approach also does not work if one cannot become root in the first place or is limited by a virtualized environment. Imagine you are trying to build a Debian rootfs on a machine where you just have a regular user account. Or a situation I was recently in: I had a virtual server which denied me operations like loop mounting. Given all these downsides, why is it still so common to just assume that one is able and willing to use sudo and be done with it in most cases? I really wonder why technologies like fakeroot and fakechroot have only been developed this late. Has this problem not been around since the earliest days of Linux/Unix? Am I missing something and rambling around for nothing? Is this idea a lost cause or something that is worth spending time and energy on to extend and fix the required tools?

5 November 2013

Johannes Schauer: announcing bootstrap.debian.net

The following post is a verbatim copy of my message to the debian-devel list. While botch produces loads of valuable data to help maintainers modifying the right source packages with build profiles and thus make Debian bootstrappable, it has so far failed at producing this data in a format which is: While human readability is probably still lacking (it's hard to write in a manner understandable by everybody about a complicated topic you are very familiar with), the bootstrapping results are now generated automatically (on a daily basis) and published in a per-source-package-basis as well. Thus let me introduce to you: bootstrap.debian.net Paul Wise encouraged me to set this up and also donated the debian.net CNAME to my server. Thanks a lot! The data is generated daily from the midnight packages/sources records of snapshot.debian.org (I hope it's okay to grab the data from there on a daily basis). The resulting data can be viewed in HTML format (with some javascript for to allow table sorting and paging in case you use javascript) per architecture (here for amd64). In addition it also produces HTML pages per source package for all source packages which are involved in a dependency cycle in any architecture (for example src:avahi or src:python2.7). Currently there are 518 source packages involved in a dependency cycle. While this number seems high, remember that estimations by calculating a feedback arc set suggest that only 50-60 of these source packages have to be modified with build profiles to make the whole graph acyclic. For now it is funny to see that the main architectures do not bootstrap (since July, for different reasons) while less popular ones like ia64 and s390x bootstrap right now without problems (at least in theory). According to the logs (also accessible at above link, here for amd64) this is because gcc-4.6 currently fails compiling due to a build-conflict (this has been reported in bug#724865). Once this bug is fixed, all arches should be bootstrappable again. Let me remind you here that the whole analysis is done on the dependency relationships only (not a single source package is actually compiled at any point) and compilation might fail for many other reasons in practice. It has been the idea of Paul Wise to integrate this data into the pts so that maintainers of affected source packages can react to the heuristics suggested by botch. For this purpose, the website also publishes the raw JSON data from which the HTML pages are generated (here for amd64). The bugreport against the bts can be found in bug#728298. I'm sure that a couple of things regarding understandability of the results are not yet sufficiently explained or outright missing. If you see any such instance, please drop me a mail, suggesting what to change in the textual description or presentation of the results. I also created the following two wiki pages to give an overview of the utilized terminology: Feel also free to tell me if anything in these pages is unclear. Direct patches against the python code producing the HTML from the raw JSON data are also always welcome.

11 July 2013

Johannes Schauer: Using botch to generate transition build orders

In October 2012, Joachim Breitner asked me whether botch (well, it didnt have a name back then) can also be used to calculate a build order for recompiling ~450 haskell-packages with a new ghc version (it was probably the 7.6.1 release) to upload them to experimental. What is still blocking this ability is the inability of botch to directly read *.dsc files instead of having to rely on Packages and Sources files. On the other hand (in case there exists a set of Packages and Sources files) it is now much easier to use botch for such a use case for which it was not originally designed. To demonstrate how botch can be used to calculate the build order for library transitions, I wrote the script create-transition-order.sh which executes the individual tools in the correct order with the correct arguments. To validate the correctness of botch, I compared the output to the order which ben produces for transitions. The create-transition-order.sh script is called with a ben query string and optionally a snapshot.debian.org timestamp as its arguments. The script relies on ben being installed because ben query strings cannot be translated into grep-dctrl query strings as ben query splits fields which contain comma separated values at the comma before searching for strings. Unfortunately the script also currently relies on a yet unreleased ben feature which allows ben download to create a ben.cache file. You can track the progress of this feature as bug#714703. Creating a ben.cache file is necessary because some queries rely on an association between binary and source packages which is not present in all packages in a Packages file. For example the haskell transition query makes use of this feature by including the .source field in its query. The result of these trials was, that botch produced nearly the same build order for nearly all transitions. The only differences are due to shortcomings of ben and botch. For example, ben is not able to create an order for transitions where involved packages form one or more cycles. A prominent example is the haskell transition where ghc itself is only built during step 15 after many other packages which would have needed ghc to be compiled. Botch solves this problem by reducing all strongly connected components in a cyclic graph to a single vertex before creating the order. This operation makes the graph acyclic and creating a build order trivial. The only remaining problems can then be found within the strongly connected components (for Haskell they are of maximum size two) but the overall build order is correct. On the other hand botch has no notion of packages which are affected by a transition and thus creates a build order which in some cases is longer than the one created by ben. When generating the interdependencies between packages, ben only considers those which are part of the transition. Botch on the other hand considers all dependency relationships. It would be simple to solve this issue in botch by removing unaffected packages from the dependency graph through edge contraction (an operation already used by botch for other tasks). This exercise also let me find another bug in dose3 where libdose would not associate a binary package with a binNMU version with its associated source package but instead report a version mismatch error. This problem was also reported in the process.

5 June 2013

Johannes Schauer: Reliable IMAP synchronization with IDLE support

The task: reliably synchronize my local MailDir with several remote IMAP mailboxes with IDLE support so that there is no need to poll with small time intervals to get new email immediately. Most graphical mail clients like icedove/thunderbird or evolution have IDLE support which means that their users get notified about new email as soon as it arrives. I prefer to handle email fetching/reading/sending using separate programs so after having used icedove for a long time, I switched to a offlineimap/mutt based setup a few years ago. Some while after that I discovered sup and later notmuch/alot which made me switch again. Now my only remaining problem is the synchronization between my local email and several remote IMAP servers. Using offlineimap worked fine in the beginning but soon I discovered some of its shortcomings. It would for example sometimes lock up or simply crash when I switched between different wireless networks or switched to ethernet or UMTS. Crashing was not a big problem as I just put it into a script which re-executed it every time it crashed. The problem was it locking up for one of its synchronizing email accounts while the others were kept in sync as usual. This once let to me missing five days of email because offlineimap was not crashing and I believed everything was fine (new messages from the other accounts were scrolling by as usual) while people were sending me worried emails whether I was okay or if something bad had happened. I nearly missed a paper submission deadline and another administrative deadline due to this. This was insanely annoying. It turned out that other mail synchronization programs suffered from the same lockup problem so I stuck with offlineimap and instead executed it as:
while true; do
timeout --signal=KILL 1m offlineimap
notmuch new
sleep 5m
done
Which would synchronize my email every five minutes and kill offlineimap if a synchronization took more than one minute. While I would've liked an email synchronizer which would not need this measure, this worked fine over the months. After some while it bugged me that icedove (which my girlfriend is using) was receiving email nearly instantaneously after they arrived and I couldnt have this feature with my setup. This instantaneous arrival of email works by using the IMAP IDLE command which allows the server to notify the client once new email arrives instead of the client having to poll for new email in small intervals. Unfortunately offlineimap (and any other email synchronizer I found) would not support the IDLE command. There is a fork of it which supports IDLE by using a newer python imap library but this is of little use to me as there is no possibility of a hook which executes when new email arrives so that I can execute notmuch new on the new email. At this point I could've used inotify to execute notmuch new upon arrival of new email but I went another way. Here is a short python script idle.py which connects to my IMAP servers and sends the IDLE command:
import imaplib
import select
import sys

class mysock():
def __init__(self, name, server, user, passwd, directory):
self.name = name
self.directory = directory
self.M = imaplib.IMAP4(server)
self.M.login(user, passwd)
self.M.select(self.directory)
self.M.send("%s IDLE\r\n"%(self.M._new_tag()))
if not self.M.readline().startswith('+'): # expect continuation response
exit(1)

def fileno(self):
return self.M.socket().fileno()

if __name__ == '__main__':
sockets = [
mysock("Name1", "imap.foo.bar", "user1", "pass1", "folder1"),
mysock("Name1", "imap.foo.bar", "user1", "pass1", "folder2"),
mysock("Name2", "imap.blub.bla", "user2", "pass2", "folder3")
]
readable, _, _ = select.select(sockets, [], [], 1980) # 33 mins timeout

found = False
for sock in readable:
if sock.M.readline().startswith('* BYE '): continue
print "-u basic -o -q -f %s -a %s"%(sock.directory, sock.name)
found = True
break

if not found:
print "-u basic -o"
It would create a connection to every directory I want to watch on every email account I want to synchronize. The select call will expire after 33 minutes as most email servers would drop the connection if nothing happens at around 30 minutes. If something happened within that time though, the script would output the arguments to offlineimap to do a quick check on just that mailbox on that account. If nothing happened, the script would output the arguments to offlineimap to do a full check on all my mailboxes.
while true; do
args= timeout --signal=KILL 35m python idle.py
echo "idle timed out" >&2;
args="-u basic -o";

echo "call offlineimap with: $args" >&2
timeout --signal=KILL 1m offlineimap $args
echo "offlineimap timed out" >&2;
continue;

notmuch new
done
Every call which interacts with the network is wrapped in a timeout command to avoid any funny effects. Should the python script timeout, a full synchronization with offlineimap is triggered. Should offlineimap timeout, an error message is written to stderr and the script continues. The above naturally has the disadvantage of not immediately responding to new email which arrives during the time that idle.py is not running. But as this email will be fetched once the next message arrives on the same account, there is no much waiting time and so far, this problem didnt bite me. Is there a better way to synchronize my email and at a same time make use of IDLE? I'm surprised I didnt find software which would offer this feature.

28 May 2013

Johannes Schauer: New bootstrap analysis results

I wrote a small wiki page for botch. It includes links to some hopefully useful resources like my FOSDEM 2013 talk or my master thesis about botch. The final goal of botch is to generate a build order with which Debian can be bootstrapped from zero to more than 18000 source packages. But such a build order can only be generated once enough source packages (by now the number is around 70) incorporate build profiles which allow to compile them with less build dependencies and thereby break all build dependency cycles. So an intermediary goal of botch is, to find a "good" (i.e. a close to minimal amount) selection of such build dependencies that should be dropped from their source packages. So far, the results of all heuristics we use for this task were just dumped to standard output. Since this textual format was hard to read for a human developer and even worse machine readable, that tool now outputs its results in JSON. This data is then converted by another tool to a more human readable format. One such format is HTML and with the help of javascript for sortable and pageable tables, the data can then be presented without producing too much visual clutter (at least compared to the initial textual format). Another advantage of HTML is, that the data can now easily be shared with other developers without them having to run botch or install any other program first. So without further ado, please find the result of running our heuristics on Debian Sid as of 2013-01-01 here: http://mister-muffin.de/bootstrap/stats/ Here is a screenshot of what you can expect (you can see part of the table of edges with most cycles through them): As a developer uses this information to add build profiles to source packages, this HTML page can be regenerated and will show which strongly connected components are now left in the graph and how to best make them acyclic as well. Besides other improvements I also updated the data I used to generate the graphs from this post. I found this update to be necessary since we found and fixed several implementation bugs since October 2012, so this new plot should be more accurate. Here a graph which shows the amount of vertices in the biggest strongly connected component of Debian Sid by date. Some datapoints are missing for dates for which important source packages in Sid were not compilable.

16 April 2013

Johannes Schauer: botch - the debian bootstrap software has a name

After nearly one year of development, the "Port bootstrap build-ordering tool" now finally has a name: botch. It stands for Bootstrap/Build Order Tool CHain. Now we dont have to call it "Port bootstrap build-ordering tool" anymore (nobody did that anyways) and also dont have to talk about it by referring to it as "the tools" in email, IRC or in my master thesis which is due in a few weeks. With this, the issue also doesnt block the creation of a Debian package anymore. Since only a handful of people have a clone of it anyway, I also renamed the gitorious git repository url (and updated all links in this blog and left a text file informing about the name change in the old location). The new url is: https://gitorious.org/debian-bootstrap/botch Further improvements since my last post in January include: In the mean time a paper about the theoretical aspects of this topic which Pietro Abate and I submitted to the CBSE 2013 conference got accepted (hurray!) and I will travel to the conference in Canada in June. Botch (yeey, I can call it by a name!) is also an integral part of one of the proposals for a Debian Google Summer of Code project this year, mentored by Wookey and co-mentored by myself. Lets hope it gets accepted and produces good results in the end of the summer!

1 February 2013

Johannes Schauer: Bootstrappable Debian FOSDEM talk

I will give a talk about the current status of automating Debian bootstrapping at FOSDEM 2013. There will also be a live demo of the developed toolchain. The talk will start 16:30 on Saturday, 2 February 2013 in room H.1302 as part of the Cross distro devroom track and will be half an hour long. Directly after my talk, at 17:00, Wookey will give his about bootstrapping Debian/Ubuntu for arm64 in the same location. The current version of the slides can be downloaded here. The latex beamer source is available in the git repository of the debian bootstrap project. If you can't make it to FOSDEM, then my last post about the topic gives away most of the things I will be talking about.

25 January 2013

Johannes Schauer: Bootstrappable Debian - New Milestone

This post is about the port bootstrap build ordering tool (naming suggestions welcome) which was started as a Debian Google Summer of Code project in 2012 and continued to be developed afterwards. Sources are available through gitorious. In the end of November 2012, I managed to put down an approximation algorithm to the feedback arc set problem which allowed to break the dependency graph into a directed acyclic graph with only few removed build dependencies. I wrote about this effort on our mailinglist but didnt mention it here because it was still too much of a proof-of-concept. Later, in January 2013, I mentioned the result of this algorithm in an email wookey and me wrote to debian-devel mailinglist. Many things happened since November 2012:

Processing pipeline instead of monolithic tools The tools I developed so far tried to accomplish everything by themselves, reusing functionality implemented in a central library. Therefor, if one wanted to try out even trivial new things, it mostly meant to hack some OCaml code. Pietro Abate suggested to instead develop smaller tools which could work independently of each other, would only execute one algorithm each and could easily be connected together in different ways to achieve different effects. This switch is now done and all functionality from the old tools is moved into a new toolset. The exchange format between the tools is either plain text files in deb822 control format (Packages/Sources files) or a dependency graph. The dependency graph is currently marshalled by OCaml but future versions should work with just passing a GraphML (an XML graph format) representation around. This new way of doing things seems close to the UNIX philosophy (each program does one thing well, data is stored as text, every program is a filter). For example the deb822 control output can easily be manipulated using grep-dctrl(1) and there exist many tools which can read, analyze and manipulate GraphML. It is a big improvement over the old, monolithic tools which did not allow to manipulate any intermediate result by external, existing tools. Currently, a shell script (native.sh) will execute all tools in a meaningful order, connecting them together correctly. The same tools will be used for a future cross.sh but they will be connected differently. I wrote about a first proposal of what the individual tools should do and how they should be connected in this email from which I also linked a confusing overview of the pipeline. This overview has recently been improved to be even more confusing and the current version of the lower half (the native part) now looks like this: native reduced Solid arrows represent a flow of binary packages, dottend arrows represent a flow of source packages, ovals represent a set of packages, boxes with rounded corners represent set operations, rectangular boxes represent filters. There is only one input to a filter, which is the arrow connected to the top of the box. Outgoing arrows from the bottom represent the filtered input. Ingoing arrows to either side are arguments to the filter and control how the filter behaves depending on the algorithm. I will explain this better once the pipeline proved to be less in flux. The pipeline is currently executed like this by native.sh.

Two new ways to break dependency cycles have been discovered So far, we knew of three ways to formally break dependency cycles:
  • remove build dependencies through build profiles
  • find out that the build dependency is only used to build arch:all packages and therefor put it into Build-Depends-Indep
  • cross compile some source packages
Two new methods can be added to the above:

Dependency graph definition changed Back in September when I was visiting irill, Pietro found a flaw in how the dependency graph used to be generated. He supplied a new definition of the dependency graph which does away with the problem he found. After fixing some small issues with his code, I changed all the existing algorithms to use the new graph definition. The old graph is as of today removed from the repository. Thanks to Pietro for supplying the new graph definition - I must still admit that my OCaml foo is not strong enough to have come up with his code.

Added complexity for profile built source packages As mentioned in the introduction, wookey and me addressed the debian-devel list with a proposal on necessary changes for an automated bootstrapping of Debian. During the discussiong, two important things came up which are to be considered by the dependency graph algorithms:
  • profile built source packages may not create all binary packages
  • profile built source packages may need additional build dependencies
Both things make it necessary to alter the dependency graph during the generation of a feedback arc set and feedback vertex set beyond the simple removal of edges. Luckily, the developed approximation algorithms can be extended to support such changes in the graph.

Different feedback arc set algorithms The initially developed feedback arc set algorithm is well suited to discover build dependency edges which should be dropped. It performs far worse when creating the final build order because it only considers edges by itself and not how many edges of a source package can be dropped by profile building it. The adjusted algorithm for generating a build order is more of a feedback vertex set algorithm because instead of greedily finding the edge with most cycles through it, it greedily finds the source package which would break most cycles if it was profile built.

Generating a build order with less profile built source packages After implementing all the features above I now feel more confident to publish the current status of the tools to a wider audience. The following test shows a run of the aforementioned ./native.sh shell script. Its final output is a list of source packages which have to be profile built and a build order. Using the resulting build order, starting from a minimal build system (essential:yes, build-essential and debhelper), all source packages will be compiled which are needed to compile all binary packages in the system. The result will therefor be a list of source and binary packages which fulfill the following property:
  • all binary packages can be built from the available source packages
  • all source packages can be built with the available binary packages
I called this a "reduceded distribution" in earlier posts. The interesting property of this specific selection is, that it contains the biggest problem set of Debian when bootstrapping it: a 900 to 1000 nodes big strongly connected component. Here is a visualization of the problem: hideous mess Source packages do not yet come with build profiles and the cross build situation can not yet be analyzed, so the following assumptions were made: The last point is about 14 build dependencies which were decided to be broken by the feedback arc set algorithm but for which other data sources did not indicate that they are actually breakable. Those 14 are dynamically generated by native.sh. If above assumptions should not be too far from the actual situation, then not more than 73 source packages have to be modified to bootstrap a reduced distribution. This reduced distribution even includes dependency-wise "big" packages like webkit, metacity, iceweasel, network-manager, tracker, gnome-panel, evolution-data-server, kde-runtime, libav and nautilus. By changing one line in native.sh one could easily develop a build order which generates gnome-desktop or really any given (meta-)package selection. All of native.sh takes only 80 seconds to execute on my system (Core i5, 2.5 GHz, singlethreaded). Here is the final build order which creates 2044 binary packages from 613 source packages.
  1. nspr, libio-pty-perl, libmcrypt, unzip, libdbi-perl, cdparanoia, libelf, c-ares, liblocale-gettext-perl, libibverbs, numactl, ilmbase, tbb, check, libogg, libatomic-ops, libnl($), orc($), libaio, tcl8.4, kmod, libgsm, lame, opencore-amr, tcl8.5, exuberant-ctags, mhash, libtext-iconv-perl, libutempter, pciutils, gperf, hspell, recode, tcp-wrappers, fdupes, chrpath, libbsd, zip, procps, wireless-tools, cpufrequtils, ed, libjpeg8, hesiod, pax, less, dietlibc, netkit-telnet, psmisc, docbook-to-man, libhtml-parser-perl, libonig, opensp($), libterm-size-perl, linux86, libxmltok, db-defaults, java-common, sharutils, libgpg-error, hardening-wrapper, cvsps, p11-kit, libyaml, diffstat, m4
  2. openexr, enca, help2man, speex, libvorbis, libid3tag, patch, openjade1.3, openjade, expat, fakeroot, libgcrypt11($), ustr, sysvinit, netcat, libirman, html2text, libmad, pth, clucene-core, libdaemon($), texinfo, popt, net-tools, tar, libsigsegv, gmp, patchutils($), dirac, cunit, bridge-utils, expect, libgc, nettle, elfutils, jade, bison
  3. sed, indent, findutils, fastjar, cpio, chicken, bzip2, aspell, realpath, dctrl-tools, rsync, ctdb, pkg-config($), libarchive, gpgme1.0, exempi, pump, re2c, klibc, gzip, gawk, flex-old, original-awk, mawk, libtasn1-3($), flex($), libtool
  4. libcap2, mksh, readline6, libcdio, libpipeline, libcroco, schroedinger, desktop-file-utils, eina, fribidi, libusb, binfmt-support, silgraphite2.0, atk1.0($), perl, gnutls26($), netcat-openbsd, ossp-uuid, gsl, libnfnetlink, sg3-utils, jbigkit, lua5.1, unixodbc, sqlite, wayland, radvd, open-iscsi, libpcap, linux-atm, gdbm, id3lib3.8.3, vo-aacenc, vo-amrwbenc, fam, faad2, hunspell, dpkg, tslib, libart-lgpl, libidl, dh-exec, giflib, openslp-dfsg, ppl($), xutils-dev, blcr, bc, time, libdatrie, libpthread-stubs, guile-1.8, libev, attr, libsigc++-2.0, pixman, libpng, libssh2, sqlite3, acpica-unix, acl, a52dec
  5. json-c, cloog-ppl, libverto, glibmm2.4, rtmpdump, libdbd-sqlite3-perl, nss, freetds, slang2, libpciaccess, iptables, e2fsprogs, libnetfilter-conntrack, bash, libthai, python2.6($), libice($), libpaper, libfontenc($), libxau, libxdmcp($), openldap($), cyrus-sasl2($), openssl, python2.7($)
  6. psutils, libevent, stunnel4, libnet-ssleay-perl, libffi, readline5, file, libvoikko, gamin, libieee1284, build-essential, libcap-ng($), lcms($), keyutils, libxml2, libxml++2.6, gcc-4.6($), binutils, dbus($), libsm($), libxslt, doxygen($)
  7. liblqr, rarian, xmlto, policykit-1($), libxml-parser-perl, tdb, devscripts, eet, libasyncns, libusbx, icu, linux, libmng, shadow, xmlstarlet, tidy, gavl, flac, dbus-glib($), libxcb, apr, krb5, alsa-lib
  8. usbutils, enchant, neon27, uw-imap, libsndfile, yasm, xcb-util, gconf($), shared-mime-info($), audiofile, ijs($), jbig2dec, libx11($)
  9. esound, freetype, libxkbfile, xvidcore, xcb-util-image, udev($), libxfixes, libxext($), libxt, libxrender
  10. tk8.4, startup-notification, libatasmart, fontconfig, libxp, libdmx, libvdpau, libdrm, libxres, directfb, libxv, libxxf86dga, libxxf86vm, pcsc-lite, libxss, libxcomposite, libxcursor, libxdamage, libxi($), libxinerama, libxrandr, libxmu($), libxpm, libxfont($)
  11. xfonts-utils, libxvmc, xauth, libxtst($), libxaw($)
  12. pmake, corosync, x11-xserver-utils, x11-xkb-utils, coreutils, xft, nas, cairo
  13. libedit, cairomm, tk8.5, openais, pango1.0($)
  14. ocaml, blt, ruby1.8, firebird2.5, heimdal, lvm2($)
  15. cvs($), python-stdlib-extensions, parted, llvm-2.9, ruby1.9.1, findlib, qt4-x11($)
  16. xen, mesa, audit($), avahi($)
  17. x11-utils, xorg-server, freeglut, libva
  18. jasper, tiff3, python3.2($)
  19. openjpeg, qt-assistant-compat, v4l-utils, qca2, jinja2, markupsafe, lcms2, sip4, imlib2, netpbm-free, cracklib2, cups($), postgresql-9.1
  20. py3cairo, pycairo, pam, libgnomecups, gobject-introspection($)
  21. gdk-pixbuf, libgnomeprint, gnome-menus, gsettings-desktop-schemas, pangomm, consolekit, vala-0.16($), colord($), atkmm1.6
  22. libgee, gtk+3.0($), gtk+2.0($)
  23. gtkmm2.4, poppler($), openssh, libglade2, libiodbc2, gcr($), libwmf, systemd($), gcj-4.7, java-atk-wrapper($)
  24. torque, ecj($), vala-0.14, gnome-keyring($), gcc-defaults, gnome-vfs($)
  25. libidn, libgnome-keyring($), openmpi
  26. mpi-defaults, dnsmasq, wget, lynx-cur, ghostscript, curl
  27. fftw3, gnupg, libquvi, xmlrpc-c, raptor, liboauth, groff, fftw, boost1.49, apt
  28. boost-defaults, libsamplerate, cmake, python-apt
  29. qjson, qtzeitgeist, libssh, qimageblitz, pkg-kde-tools, libical, dwarves-dfsg, automoc, attica, yajl, source-highlight, pygobject, pygobject-2, mysql-5.5($)
  30. libdbd-mysql-perl, polkit-qt-1, libdbusmenu-qt, raptor2, dbus-python, apr-util
  31. rasqal, serf, subversion($), apache2
  32. git, redland
  33. xz-utils, util-linux, rpm, man-db, make-dfsg, libvisual, cryptsetup, libgd2, gstreamer0.10
  34. mscgen, texlive-bin
  35. dvipng, luatex
  36. libconfig, transfig, augeas, blas, libcaca, autogen, libdbi, linuxdoc-tools, gdb, gpm
  37. ncurses, python-numpy($), rrdtool, w3m, iproute, gcc-4.7, libtheora($), gcc-4.4
  38. libraw1394, base-passwd, lm-sensors, netcf, eglibc, gst-plugins-base0.10
  39. libiec61883, qtwebkit, libvirt, libdc1394-22, net-snmp, jack-audio-connection-kit($), bluez
  40. redhat-cluster, gvfs($), pulseaudio($)
  41. phonon, libsdl1.2, openjdk-6($)
  42. phonon-backend-gstreamer($), gettext, libbluray, db, swi-prolog, qdbm, swig2.0($)
  43. highlight, libselinux, talloc, libhdate, libftdi, libplist, python-qt4, libprelude, libsemanage, php5
  44. samba, usbmuxd, libvpx, lirc, bsdmainutils, libiptcdata, libgtop2, libgsf, telepathy-glib, libwnck3, libnotify, libunique3, gnome-desktop3, gmime, glib2.0, json-glib, libgnomecanvas, libcanberra, orbit2, udisks, d-conf, libgusb
  45. libgnomeprintui, libimobiledevice, nautilus($), libbonobo, librsvg
  46. evas, wxwidgets2.8, upower, gnome-disk-utility, libgnome
  47. ecore, libbonoboui
  48. libgnomeui
  49. graphviz
  50. exiv2, libexif, lapack, soprano, libnl3, dbus-c++
  51. atlas, libffado, graphicsmagick, libgphoto2, network-manager
  52. pygtk, jackd2, sane-backends, djvulibre
  53. libav($), gpac($), ntrack, python-imaging, imagemagick, dia
  54. x264($), matplotlib, iceweasel
  55. strigi, opencv, libproxy($), ffms2
  56. kde4libs, frei0r, glib-networking
  57. kde-baseapps, kate, libsoup2.4
  58. geoclue, kde-runtime, totem-pl-parser, libgweather, librest, libgdata
  59. webkit
  60. zenity, gnome-online-accounts
  61. metacity, evolution-data-server
  62. gnome-panel
  63. tracker
The final recompilation of profile built source packages is omitted. Source packages marked with a ($) are selected to be profile built. All source packages listed in the same line can be built in parallel as they do not depend upon each other. This order looks convincing as it first compiles a multitude of source packages which have no or only few build dependencies lacking. Later steps allow fewer source packages to be compiled in parallel. The amount of needed build dependencies is highest in the source packages that are built last.

13 October 2012

Johannes Schauer: Does it become harder to bootstrap Debian?

My last post explained how I retrieved and corrected data from snapshot.debian.org so that dose3 was able to parse it. In this post I will cover some surprising results I found when using my tools on those Packages and Sources files from 2005 until today. For each pair of Packages and Sources files I did the following:
  1. created a reduced distribution
  2. calculated the dependency graph
I call a reduced distribution the smallest set of binary and source packages with the following properties: Creating a reduced distribution first, greatly increases the execution speed of my algorithms as it reduces the amount of binary and source packages by an order of magnitude while still preserving the dependency cycle situation of the core packages. In many cases, once the packages of a reduced distribution are available, all the rest of Debian can be compiled from them without any dependency cycles. As also mentioned in earlier posts, there is always one central, big strongly connected component (SCC) in the dependency graph. I am especially interested in how the size of the reduced distribution and the SCC change over time as both are an indication of: Lets look at the plots I did from the data I gathered. The gray data points indicate that at that point in time, one or more of the core source packages (the ones in the reduced distribution) in Debian Sid was not compilable. This means that the resulting values cannot be fully trusted. But as it is mostly only a single source package that doesnt compile, it doesnt influence the overall result much and therefor I included them anyways. Red and green data points represent a fully successful run. The only thing that I do not yet understand is what happened in 2007... So while a potential porter in 2005 only had to look at a graph of 150 nodes, he now needs to solve a graph of nearly 1000 nodes. The amount of edges in the dependency graph grew even more dramatic from about 500 to over 8000 edges. While the dependency situation for Debian Sid in 2005 can easily be printed using xdot and visually solved, this in not possible anymore in 2012. While dependencies of only a few dozen source packages had to manually be dropped in 2005, now even dropping build dependencies from a few hundred source packages doesnt solve the dependency situation. So my assumption is, that due to a growing amount of interdependencies between source and binary packages (as both gain more features), bootstrapping Debian for a new architecture becomes harder over time. Is this also the perceived subjective impression of people that ported Debian in the past? If my assumption is correct, then there is a growing need for official support of droppable build dependencies (or "stage builds" or "profile builds") to break dependency cycles during the bootstrapping process. Work of a porter would be much easier if source packages would already contain information about what build dependencies can be dropped (if so needed). In the best case, a machine could use those annotations to calculate a build order automatically. As one can see in the graph above, there are currently 370 source packages in the main SCC. This means that no more than this amount of packages (but probably much less) have to be annotated to break the SCC into a directed acyclic graph. Discussion about what syntax to use to mark potentially droppable build dependencies currently happens in bug#661538 but should maybe be discussed by a wider audience. The currently favored solution was proposed in said bugreport by Guillem Jover and is called "build profiles". It has the advantage that it is not only trivial to implement (a patch exist for dpkg and dose3 already supports them) but would also be useful for other purposes like embedded builds. The format is similar to how architecture restrictions for individual dependencies are specified but uses "triangular brackets":
Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny
The work Patrick McDermott did for his GSoC project over the summer already uses above syntax.

12 October 2012

Johannes Schauer: analyzing Packages and Sources from snapshot.debian.org

When I wanted to use my dependency graph analysis tools to analyze earlier states of Debian Sid, I naturally used snapshot.debian.org to retrieve the Packages and Sources files from which my tools retrieve the dependency information. The problem is, that many of those Packages and Sources files contain syntax errors that make the dose3 parser choke. This leads to my tools being unable to parse the affected files. The following script does not only download all Packages and Sources files in a five day interval (4460 MB from 2005/03/12 to 2012/10/11) but also cleans all the syntax errors that were not parsable by dose3. This includes invalid version naming, architecture lists separated by commas, disjunctions in Conflict fields and incorrect braces/bracket usage. Maybe this helps others who also want to profit from Packages and Sources files from the past. Fun fact #1: starting from June 2010, there were no more syntax errors in the Packages and Sources files of Debian Sid. Fun fact #2: starting from December 2009, there are no more mismatches between versions of binary packages in the Packages file and the versions of the corresponding source packages in the Sources file. <script src="https://gist.github.com/3880904.js"> </script>

10 October 2012

Johannes Schauer: Using Gentoo to find reduced build dependencies for Debian source packages

Automatically devising a build order that allows to bootstrap Debian, currently fails (amongst other reasons) because of the lack of metadata information about which build dependencies can potentially be dropped from source packages. If that information was available, an algorithm could decide which build dependencies to drop so that dependency cycles can be broken. Finding droppable build dependencies of a source package is something only humans can do. This is because it involves to manually analyze and test the build system of a source package. Build systems are neither uniform nor do they encode their dependencies in a way that can directly be mapped to Debian packages. Therefor they are not machine readable. One idea to solve the dilemma, is to find a Linux distribution that provides the following: If such a distribution can be found then the information from it can be used to find dependencies that can also be dropped from Debian source packages. Gentoo is a distribution that fulfills above requirements through so called USE flags that allow to enable or disable features during compilation. Dependencies of Gentoo source packages are stored in .ebuild files that control the build process. Since .ebuild files are bash scripts, parsing them is not trivial. I therefor used the emerge software package to extract that information. Thanks to the well written emerge code and to quick help in the Gentoo IRC channel, it didnt take long to make the code run on Debian. My sourcecode is downloadable here: https://gitorious.org/debian-bootstrap/gen2deb Before I list the results of using Gentoo USE flags to determine dependencies that can potentially be dropped from Debian source packages, let me list the problems that this method entails.

Only package name matching, no version matching When writing the mapping from Debian to Gentoo packages and back I discard version information. There are just too many versions that either Debian or Gentoo have and are not present in the other. So the assumption is, that Debian Sid and Gentoo have both the most recent major versions of upstream software which has roughly the same requirements in terms of build dependencies.

Gentoo packages are matched to Debian source packages In Gentoo there are only source packages and no binary packages. So I map Gentoo packages to Debian source packages. But Gentoo source packages build depend on other source packages while Debian source packages depend on binary packages. So at some point I have to translate Gentoo packages to Debian source packages and those source packages to Debian binary packages. I do this by analyzing the original binary package build dependencies of a Debian source package and then filter out those binary packages as being droppable that are built by the Debian source packages that were found to be droppable.

Not the exact same package set There is some software that is only in Gentoo and some that is only in Debian. Debian and Gentoo also split some source packages differently.

Gentoo has more direct dependencies Many build dependencies in Debian are indirectly pulled in through dependencies of direct build dependencies. In Gentoo source packages directly depend on most things they need to build successfully. This leads to the list of dependencies in Gentoo to be much larger than the list of dependencies in the corresponding Debian source package. It also means that lots of dependencies that can be dropped in Gentoo are not found to be droppable in Debian because they are not direct dependencies of that source package.

There are no implicit dependencies Gentoo will often drop dependencies that are essential or build-essential packages in Debian and are therefor implicit build dependencies that cannot be dropped.

Result Despite the many problems, the result doesnt look too wrong. I got some Debian source packages that were found to have droppable build dependencies from Thorsten Glaser and all dependencies that Gentoo found to be droppable were also dropped by him. To put everything into numbers: the current 912 nodes big SCC in Debian Sid can be reduced to 6 individual SCC with 422, 5, 5, 3, 2 and 2 nodes each. So using Gentoo cuts the size of the central component to more than half. Surely, there will be a number of dependencies that were found to be droppable in Gentoo but are actually not droppable in Debian. The point is, that it is better to have "some" data even if it contains false positives than no data at all. It is easier for a human to verify if some suggested droppable build dependencies are actually correct than going through hundreds of source packages with thousands of dependencies manually.

28 September 2012

Johannes Schauer: visit at Paris IRILL

Last week, I was invited to give a talk about my Debian bootstrapping efforts at IRILL in Paris. The slides of my talk are online as pdf. The time I spent in Paris with Pietro Abate was very fruitful. I have to thank him and Roberto di Cosmo for inviting me and even compensating for my travel expenses. The things we actually managed to implement during my visit: The two most important things (in my opinion) that we came up with for future implementation, were the idea to harvest reduced build dependency information from Gentoo as well as finding a flaw in the way the current dependency graph relates to binary packages and their installation sets. I am still busy with evaluating output from my trials with Gentoo, so I will cover this topic in a later blog post once I generated the actual impact on the dependency graph. The current status is, that Gentoo USE flags allow me to find possibly droppable build dependencies for 250 out of the 350 interesting Debian source packages that are part of the main scc. Pietro also found a current flaw in how the dependency graph is generated. While source package A and B might both depend on binary package C (and its installation set), it is wrong to add a dependency to C from both source packages without further verification. Due to virtual packages and disjunctive dependencies, C might have many possible installation sets. Only one of them is chosen in the current code. The problem is, that this one chosen set might conflict with the other build dependencies of source packages depending on C. Therefor, there must exist multiple binary package nodes C, each with a different installation set, dynamically generated as they are needed. Source packages must point to the node for C that possesses an installation set that doesnt conflict with its own build dependencies. Other TODO notes that we came up with and that I will be implementing are: As I cannot always depend on new dose3 versions being pushed to Debian Sid right after their release, I will do the dose3 git submodule integration over the weekend. This will allow me to evaluate the results I got from my evaluation of Gentoo USE flags that I gathered over the past week.

8 August 2012

Johannes Schauer: Bootstrappable Debian - How to help

TLDR: multiarch, multiarch, multiarch, cross buildability, staged build dependencies, wiki page, corrections/hints/requests to debian-bootstrap at lists.mister-muffin.de This summer (and this year's GSoC) is nearing its end and to make it easier for people to make use of the information my tools produced so far, I created a page in the Debian wiki. It lists not only the open issues I see but also statistics that I gathered using the output of my GSoC project. I want to use this blog post to make people aware of that page as well as to get some feedback on it and anything related to it. The biggest blocker my tools face, is that many packages are still missing multiarch information. As long as at least the basic packages do not have their cross build dependencies satisfied via multiarch for an existing foreign architecture, automated tools can not properly analyze the dependency situation in the bootstrapping case, when many packages of the new foreign architecture do not even exist yet. If Debian is supposed to be bootstrappable, then the first stage is to make a set of basic packages cross compile for an existing foreign architecture. Once this is possible, a tool of mine can analyze the cyclic build dependency situation that might occur when cross compiling for an architecture that does not exist yet. Then, staged cross builds can be used to cross compile a minimal foreign system. Due to missing multiarch classification, it is not known yet how big the cyclic build dependency situation is for the base packages. It is not only the conversion of packages to multiarch that is needed but also the adding of the :any (and rare cases :native) qualifier to build dependencies on M-A: allowed packages. Prominent build dependencies that should (but are not yet) be M-A: allowed are python and gettext. Both are needed as a build dependency by many packages of the base system. Unfortunately wanna-build does not understand qualifiers like :any and :native yet. Until it does, no package can be marked :any or :native and cross compilation of many base packages can not succeed. Once the point is reached, where a base system can be cross compiled from nothing, native compilation can start. Since native compilation doesnt depend on multiarch, the dependency situation when trying to natively compiling all of Debian from nothing is understood much better. Unfortunately, the cyclic build dependency situation is also much worse in the native case and there exists a big 1000 node strongly connected component of binary and source packages that all interdepend on each other. This dependency mess can be solved using three approaches: The wiki page gives many hints on how to find packages that each method can be applied to. Stage building is a tool that might be useful for cross building (we dont know for sure yet) but is definitely needed for native compilation. It is needed for native compilation because after all possible dependencies are moved to Build-Depends-Indep, the only other alternative to stage building for breaking dependency cycles is to cross build source packages. Since building a package without one of its build dependencies "staged" is often much easier than making the package in question cross compile, it is a preferred alternative. Once more packages have been made multiarch, it might be possible to prove that there is no alternative to introducing a notion of staged builds. Some people (wookey, Patrick McDermott, Guillem Jover, myself) decided that the following format to mark staged build dependencies would be preferred over others:
Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny
The <> format was proposed by Guillem Jover in bug#661538. Patches for dpkg and dose3 are done. More people need to discuss about this format for a final decision on how to indicate staged build dependencies. For more information on the topic, have a look at the corresponding wiki page. Feel free to direct any comments/critique/hints to debian-bootstrap at lists.mister-muffin.de or directly to me.

30 July 2012

Johannes Schauer: port bootstrap build-ordering tool report 4

A copy of this post is sent to soc-coordination@lists.alioth.debian.org as well as to debian-bootstrap@lists.mister-muffin.de.

Diary

July 2
  • playing around with syntactic dependency graphs and how to use them to flatten dependencies

July 4
  • make work with dose 3.0.2
  • add linux-amd64 to source architectures
  • remove printing in build_compile_rounds
  • catch Not_found exception and print warning
  • use the whole installation set in crosseverything.ml instead of flattened dependencies
  • detect infinite loop and quit in crosseverything.ml
  • use globbing in _tags file
  • use wildcards and patsubst in makefile

July 5
  • throw a warning if there exist binary packages without source packages
  • add string_of_list and string_of_pkglist and adapt print_pkg_list and print_pkg_list_full to use them
  • fix and extend flatten_deps - now also tested with Debian Sid

July 6
  • do not exclude the crosscompiled packages from being compiled in crosseverything.ml
  • clean up basebuildsystem.ml, remove old code, use BootstrapCommon code
  • clean up basenocycles.ml, remove unused code and commented out code
  • add option to print statistics about the generated dependency graph
  • implement most_needed_fast_wrong as well as most_needed_slow_correct and make both available through the menu

July 7
  • allow to investigate all scc, not only the full graph and the scc containing the investigated package
  • handle Not_found in src_list_from_bin_list with warning message
  • handle the event of the whole archive actually being buildable
  • replace raise Failure with failwith
  • handle incorrectly typed package names
  • add first version of reduced_dist.ml to create a self-contained mini distribution out of a big one

July 8
  • add script to quickly check for binary packages without source package
  • make Debian Sid default in makefile
  • add *.d.byte files to .gitignore
  • README is helpful now
  • more pattern matching and recursiveness everywhere

July 9
  • fix termination condition of reduced_dist.ml
  • have precise as default ubuntu distribution
  • do not allow to investigate an already installable package

July 10
  • milestone: show all cycles in a graph
  • add copyright info (LGPL3+)

July 11
  • advice to use dose tools in README

July 16
  • write apt_pkg based python filter script replacing grep-dctrl

July 17
  • use Depsolver.listcheck more often
  • add dist_graph.ml
  • refactor dependency graph code into its own module

July 18
  • improve package selection for reduced_dist.ml
  • improve performance of cycle enumeration code

July 20
  • implement buildprofile support into dose3

July 22
  • let dist_graph.ml use commandline arguments

July 23
  • allow dose3 to generate source package lists without Build- Depends Conflicts -Indep

July 29
  • implement crosscompile support into dose3

Results

Readme There is not yet a writeup on how everything works and how all the pieces of the code work together but the current README file provides a short introduction on how to use the tools.
  • build and runtime dependencies
  • compile instructions
  • execution examples for each program
  • step by step guide how to analyze the dependency situation
  • explanation of general commandline options
A detailed writeup about the inner workings of everything will be part of a final documentation stage.

License All my code is now released under the terms of the LGPL either version 3, or (at your option) any later version. A special linking exception is made to the license which can be read at the top of the provided COPYING file. The exception is necessary because Ocaml links statically, which means that without that exception, the conditions of distribution would basically equal GPL3+.

reduced_dist.ml Especially the Debian archive is huge and one might want to work on a reduced selection of packages first. Having a smaller selection of the archive would be significantly faster and would also not add thousands of packages that are not important for an extended base system. I call a reduced distribution a set of source packages A and a set of binary packages B which fulfill the following three properties:
  • all source packages A must be buildable with only binary packages B being available
  • all binary packages B except for architecture:all packages must be buildable from source packages A
The set of binary packages B and source packages A can be retrieved using the reduced_dist program. It allows to either build the most minimal reduced distribution or one that includes a certain package selection. To filter out the package control stanzas for a reduced distribution from a full distribution, I originally used a call to grep-dctrl but later replaced that by a custom python script called filter-packages.py. This script uses python-apt to filter Packages and Sources files for a certain package selection.

dist_graph.ml It soon became obvious that there were not many independent dependency cycle situation but just one big scc that would contain 96% of the packages that are involved in build dependency cycles. Therefor it made sense to write a program that does not iteratively build the dependency graph starting from a single package, but which builds a dependency graph for a whole archive.

Cycles I can now enumerate all cycles in the dependency graph. I covered the theoretical part in another blog post and wrote an email about the achievement to the list. Both resources contain more links to the respective sourcecode. The dependency graph generated for Debian Sid has 39486 vertices. It has only one central scc with 1027 vertices and only eight other scc with 2 to 7 vertices. All the other source and binary packages in the dependency graph for the archive are degenerate components of length one. Obtaining the attached result took 4 hours on my machine (Core i5 @ 2.53GHz). 1.5 h of that were needed to build the dependency graph, the other 2.5 hours were needed to run johnson's algorithm on the result. Memory consumption of the program was at about 700 MB. It is to my joy that apparently the runtime of the cycle finding algorithm for a whole Debian Sid repository as well as the memory requirements are within orders of magnitude that are justifiable when being run on off-the-shelf hardware. It must also be noted that nothing is optimized for performance yet. A list of all cycles in Debian Sid up to length 4 can be retrieved from this email. This cycle analysis assumes that only essential packages, build-essential and dependencies and debhelper are available. Debhelper is not an essential or build-essential package but 79% of the archive build-depends on it. The most interesting cycles are probably those of length 2 that need packages that they build themselves. Noticeable examples for these situations are vala, python, mlton, fpc, sbcl and ghc. Languages seem love to need themselves to be built.

Buildprofiles There is a long discussion of how to encode staged build dependency information in source packages. While the initial idea was to use Build-Depends-StageN fields, this solution would duplicate large parts of the Build-Depends field, which leads to bitrot as well as it is inflexible to possible other build "profiles". To remedy the situation it was proposed to use field names like Build-Depends[stage1 embedded] but this would also duplicate information and would break with the rfc822 format of package description files. A document maintained by Guillem Jover gives even more ideas and details. Internally, Patrick and me decided for another idea of Guillem Jover to annotate staged build dependencies. The format reads like:
Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny
So each build profile would follow a dependency in <> "brackets" an have a similar format as architecture options. Patrick has a patch for dpkg that implements this functionality while I patched dose3.

Dropping Build-Depends-Indep and Build-Conflicts-Indep When representing the dependencies of a source package, dose3 concatenates its Build-Depends and Build-Depends-Indep dependency information. So up to now, a source package could only be compiled, if it manages to compile all of its binary packages including architecture:all packages. But when bootstrapping a new architecture, it should be sufficient to only build the architecture dependent packages and therefor to only build the build-arch target in debian/rules and not the build-indep target. Only considering the Build-Depends field and dismissing the Build-Depends-Indep field, reduced the main scc from 1027 vertices to 979 vertices. The amount of cycles up to length four reduced from 276 to 206. Especially the cycles containing gtk-doc-tools, doxygen, debiandoc-sgml and texlive-latex-base got much less. Patrick managed to add a Build-Depends-Indep field to four packages so far which reduced the scc further by 14 vertices down to 965 vertices. So besides staged build dependencies and cross building there is now a third method that can be applied to break dependency cycles: add Build-Depends-Indep information to them or update existing information. I submitted a list of packages that have a binary-indep and/or a build-indep target in their debian/rules to the list. I also submitted a patch for dose3 to be able to specify to ignore Build-Depends-Indep and Build-Conflicts-Indep information.

Dose3 crossbuilding So far I only looked at dependency situations in the native case. While the native case contains a huge scc of about 1000 packages, the dependency situation will be much nicer when cross building. But dose3 was so far not able to simulate cross building of source packages. I wrote a patch that implements this functionality and will allow me to write programs that help analyze the cross-situation as well.

Debconf Presentation Wookey was giving a talk at debconf12 for which I was supplying him with slides. The slides in their final version can be downloaded here

Future Patrick maintains a list of "weak" build dependencies. Those are dependencies that are very likely to be droppable in either a staged build or using Build-Depends-Indep. I must make use of this list to make it easier to find packages that can easily be removed of their dependencies. I will have to implement support for resolving the main scc using staged build dependencies. Since it is unlikely that Patrick will be fast enough in supplying me with modified packages, I will need to create myself a database of dummy packages. Another open task is to allow to analyze the crossbuilding dependency situation. What I'm currently more or less waiting on is the inclusion of my patches into dose3 as well as a decision on the buildprofile format. More people need to discuss about it until it can be included into tools as well as policy. Every maintainer of a package can help making bootstrapping easier by making sure that as many dependencies as possible are part of the Build-Depends-Indep field.

Next.

Previous.